“线框”是一种基于线段的线段表示,旨在掌握周围的常规结构形状的人造场景的大规模视觉属性。与线框不同,传统的边缘或线段聚焦在所有可见边缘和线上,而不特别区别对人为结构信息更加突出。现有的线框检测模型依赖于监督注释数据但不明确地注意了解如何构成场景的结构形状。此外,我们经常面临许多前景对象,遮挡背景场景干扰它们背后的完整场景结构的适当推断。为了解决这些问题,我们首次在该领域中,提出新的条件数据生成和培训,帮助模型了解如何忽略孔所示的遮挡,例如在图像上屏蔽的前景对象区域。此外,我们首先将GaN结合在模型中,让模型更好地预测潜在的场景结构,即使超出大孔。我们还介绍了伪标签,以进一步扩大模型容量来克服小规模的标记数据。我们在定性和定量上展示我们的方法显着优于以前的工程无法处理孔,并且可以改善普通检测,没有给出孔。
translated by 谷歌翻译
现代形象染色系统,尽管取得了重大进展,往往与大型缺失区域,复杂的几何结构和高分辨率图像斗争。我们发现这是一个主要原因之一是缺乏染色网络和损失功能的有效的接受领域。为了减轻这个问题,我们提出了一种称为大面膜修正(LAMA)的新方法。喇嘛基于i)一种新的侵略网络架构,它使用具有图像宽接收领域的快速傅里叶卷曲(FFC); ii)高接受领域感性损失; iii)大型训练面具,可解锁前两个组件的潜力。我们的批准网络在一系列数据集中改善了最先进的,即使在具有挑战性的情况下也能实现出色的性能,例如,完成定期结构。我们的模型令人惊讶地展现得比在火车时间高于所看到的决议,并在比竞争性基线更低的参数和时间成本实现这一目标。代码可用于\ url {https:/github.com/saic-mdal/lama}。
translated by 谷歌翻译
Consensus clustering aggregates partitions in order to find a better fit by reconciling clustering results from different sources/executions. In practice, there exist noise and outliers in clustering task, which, however, may significantly degrade the performance. To address this issue, we propose a novel algorithm -- robust consensus clustering that can find common ground truth among experts' opinions, which tends to be minimally affected by the bias caused by the outliers. In particular, we formalize the robust consensus clustering problem as a constraint optimization problem, and then derive an effective algorithm upon alternating direction method of multipliers (ADMM) with rigorous convergence guarantee. Our method outperforms the baselines on benchmarks. We apply the proposed method to the real-world advertising campaign segmentation and forecasting tasks using the proposed consensus clustering results based on the similarity computed via Kolmogorov-Smirnov Statistics. The accurate clustering result is helpful for building the advertiser profiles so as to perform the forecasting.
translated by 谷歌翻译
In computational advertising, a challenging problem is how to recommend the bid for advertisers to achieve the best return on investment (ROI) given budget constraint. This paper presents a bid recommendation scenario that discovers the concavity changes in click prediction curves. The recommended bid is derived based on the turning point from significant increase (i.e. concave downward) to slow increase (convex upward). Parametric learning based method is applied by solving the corresponding constraint optimization problem. Empirical studies on real-world advertising scenarios clearly demonstrate the performance gains for business metrics (including revenue increase, click increase and advertiser ROI increase).
translated by 谷歌翻译
In cost-per-click (CPC) or cost-per-impression (CPM) advertising campaigns, advertisers always run the risk of spending the budget without getting enough conversions. Moreover, the bidding on advertising inventory has few connections with propensity one that can reach to target cost-per-acquisition (tCPA) goals. To address this problem, this paper presents a bid optimization scenario to achieve the desired tCPA goals for advertisers. In particular, we build the optimization engine to make a decision by solving the rigorously formalized constrained optimization problem, which leverages the bid landscape model learned from rich historical auction data using non-parametric learning. The proposed model can naturally recommend the bid that meets the advertisers' expectations by making inference over advertisers' historical auction behaviors, which essentially deals with the data challenges commonly faced by bid landscape modeling: incomplete logs in auctions, and uncertainty due to the variation and fluctuations in advertising bidding behaviors. The bid optimization model outperforms the baseline methods on real-world campaigns, and has been applied into a wide range of scenarios for performance improvement and revenue liftup.
translated by 谷歌翻译
We propose a new neural network design paradigm Reversible Column Network (RevCol). The main body of RevCol is composed of multiple copies of subnetworks, named columns respectively, between which multi-level reversible connections are employed. Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does. Our experiments suggest that CNN-style RevCol models can achieve very competitive performances on multiple computer vision tasks such as image classification, object detection and semantic segmentation, especially with large parameter budget and large dataset. For example, after ImageNet-22K pre-training, RevCol-XL obtains 88.2% ImageNet-1K accuracy. Given more pre-training data, our largest model RevCol-H reaches 90.0% on ImageNet-1K, 63.8% APbox on COCO detection minival set, 61.0% mIoU on ADE20k segmentation. To our knowledge, it is the best COCO detection and ADE20k segmentation result among pure (static) CNN models. Moreover, as a general macro architecture fashion, RevCol can also be introduced into transformers or other neural networks, which is demonstrated to improve the performances in both computer vision and NLP tasks. We release code and models at https://github.com/megvii-research/RevCol
translated by 谷歌翻译
We address the theoretical and practical problems related to the trajectory generation and tracking control of tail-sitter UAVs. Theoretically, we focus on the differential flatness property with full exploitation of actual UAV aerodynamic models, which lays a foundation for generating dynamically feasible trajectory and achieving high-performance tracking control. We have found that a tail-sitter is differentially flat with accurate aerodynamic models within the entire flight envelope, by specifying coordinate flight condition and choosing the vehicle position as the flat output. This fundamental property allows us to fully exploit the high-fidelity aerodynamic models in the trajectory planning and tracking control to achieve accurate tail-sitter flights. Particularly, an optimization-based trajectory planner for tail-sitters is proposed to design high-quality, smooth trajectories with consideration of kinodynamic constraints, singularity-free constraints and actuator saturation. The planned trajectory of flat output is transformed to state trajectory in real-time with consideration of wind in environments. To track the state trajectory, a global, singularity-free, and minimally-parameterized on-manifold MPC is developed, which fully leverages the accurate aerodynamic model to achieve high-accuracy trajectory tracking within the whole flight envelope. The effectiveness of the proposed framework is demonstrated through extensive real-world experiments in both indoor and outdoor field tests, including agile SE(3) flight through consecutive narrow windows requiring specific attitude and with speed up to 10m/s, typical tail-sitter maneuvers (transition, level flight and loiter) with speed up to 20m/s, and extremely aggressive aerobatic maneuvers (Wingover, Loop, Vertical Eight and Cuban Eight) with acceleration up to 2.5g.
translated by 谷歌翻译
Traditional multilingual neural machine translation (MNMT) uses a single model to translate all directions. However, with the increasing scale of language pairs, simply using a single model for massive MNMT brings new challenges: parameter tension and large computations. In this paper, we revisit multi-way structures by assigning an individual branch for each language (group). Despite being a simple architecture, it is challenging to train de-centralized models due to the lack of constraints to align representations from all languages. We propose a localized training recipe to map different branches into a unified space, resulting in an efficient detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build the first large-scale open-source translation benchmark covering 7 language-centric data, each containing 445 language pairs. Experiments show that Lego-MT (1.2B) brings gains of more than 4 BLEU while outperforming M2M-100 (12B) (We will public all training data, models, and checkpoints)
translated by 谷歌翻译
Despite the surprising few-shot performance of in-context learning (ICL), it is still a common practice to randomly sample examples to serve as context. This paper advocates a new principle for ICL: self-adaptive in-context learning. The self-adaption mechanism is introduced to help each sample find an in-context example permutation (i.e., selection and ordering) that can derive the correct prediction, thus maximizing performance. To validate the effectiveness of self-adaptive ICL, we propose a general select-then-rank framework and instantiate it with new selection and ranking algorithms. Upon extensive evaluation on eight different NLP datasets, our self-adaptive ICL method achieves a 40% relative improvement over the common practice setting. Further analysis reveals the enormous potential of self-adaptive ICL that it might be able to close the gap between ICL and finetuning given more advanced algorithms. Our code is released to facilitate future research in this area: https://github.com/Shark-NLP/self-adaptive-ICL
translated by 谷歌翻译
Humans are sophisticated at reading interlocutors' emotions from multimodal signals, such as speech contents, voice tones and facial expressions. However, machines might struggle to understand various emotions due to the difficulty of effectively decoding emotions from the complex interactions between multimodal signals. In this paper, we propose a multimodal emotion analysis framework, InterMulti, to capture complex multimodal interactions from different views and identify emotions from multimodal signals. Our proposed framework decomposes signals of different modalities into three kinds of multimodal interaction representations, including a modality-full interaction representation, a modality-shared interaction representation, and three modality-specific interaction representations. Additionally, to balance the contribution of different modalities and learn a more informative latent interaction representation, we developed a novel Text-dominated Hierarchical High-order Fusion(THHF) module. THHF module reasonably integrates the above three kinds of representations into a comprehensive multimodal interaction representation. Extensive experimental results on widely used datasets, (i.e.) MOSEI, MOSI and IEMOCAP, demonstrate that our method outperforms the state-of-the-art.
translated by 谷歌翻译